Skip to content

feat(tools): mission-sandbox — tune retain/observation missions against a validator#2254

Draft
nicoloboschi wants to merge 1 commit into
mainfrom
feat/mission-sandbox-tool
Draft

feat(tools): mission-sandbox — tune retain/observation missions against a validator#2254
nicoloboschi wants to merge 1 commit into
mainfrom
feat/mission-sandbox-tool

Conversation

@nicoloboschi

Copy link
Copy Markdown
Collaborator

Draft — opening for early review of the tool's shape. Workspace wiring + CI are deliberate follow-ups (see Notes).

What

Adds hindsight-tools/mission-sandbox, a dev tool for tuning Hindsight's retain (extraction) and observation (consolidation) missions and verifying them against an external validator (e.g. the LOCOMO benchmark runner). It's a small, opinionated CLI (+ optional Next.js viewer UI).

The loop

A project is init-bound to a documents path + API config. You then iterate:

  1. Refine a mission from feedback (+ optional failing examples): retain mission <proj> --feedback "…" --example "…" (or observe mission).
  2. Apply: retain apply ingests the docs into a new versioned bank <proj>-vN; observe apply clears + reconsolidates in place.
  3. Validate with your external validator (LOCOMO runner, default recall mode), record the result with note.

retain check gives a fast inner loop: it scores golden-set coverage of the current mission via chunk-level dry-run extraction (POST /memories/dry-run-extract, the endpoint added in #2205) — no re-ingest, nothing stored.

Commands: init / retain {mission,apply,check} / observe {mission,apply} / inspect / trace / curate / note / status / ui.

Notes / follow-ups (why draft)

  • Not yet wired as a root npm workspace (no package.json/lockfile change). The package is self-contained (@vectorize-io/hindsight-mission-sandbox, own package.json); cd hindsight-tools/mission-sandbox && npm install works. Happy to add the workspace entry + lockfile if we want it installed from root like hindsight-agent-sdk.
  • No CI job yet — can add one (typecheck + vitest) following the existing tool pattern.
  • Build artifacts (.next, standalone, dist, node_modules) and local project data (projects/) are gitignored; only source is committed.

Tests

vitest unit tests for the API client (hindsight.test.ts, incl. the dry-run-extract path) and the project store (store.test.ts) — 8 passing. Mission refinement + coverage judging go through Gemini with bounded exponential-backoff retry on transient 5xx (a single 503 used to abort a whole retain check).

…st a validator

Add hindsight-tools/mission-sandbox, a dev tool for iterating Hindsight's
retain (extraction) and observation (consolidation) missions and verifying
them against an external validator (e.g. the LOCOMO benchmark runner).

A project is `init`-bound to a documents path + API config. You refine a
mission from feedback (+ optional failing examples), then either
`retain apply` (ingest the docs into a new versioned bank `<project>-vN`)
or `observe apply` (clear observations + reconsolidate in place).
`retain check` scores golden-set coverage via chunk-level dry-run
extraction (POST /memories/dry-run-extract) — no re-ingest, nothing stored.

Commands: init / retain {mission,apply,check} / observe {mission,apply} /
inspect / trace / curate / note / status / ui (Next.js viewer).

Draft notes: workspace wiring (root package.json + lockfile) and a CI job
are intentional follow-ups; build artifacts (.next, standalone, dist,
node_modules) and local project data (projects/) are gitignored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant